180 research outputs found

    A two-stage approach for improved prediction of residue contact maps

    Get PDF
    BACKGROUND: Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts), the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps. In order to alleviate these problems and achieve improved contact map predictions, in this paper we split the task into two stages: the prediction of a map's principal eigenvector (PE) from the primary sequence; the reconstruction of the contact map from the PE and primary sequence. Predicting the PE from the primary sequence consists in mapping a vector into a vector. This task is less complex than mapping vectors directly into two-dimensional matrices since the size of the problem is drastically reduced and so is the scale length of interactions that need to be learned. RESULTS: We develop architectures composed of ensembles of two-layered bidirectional recurrent neural networks to classify the components of the PE in 2, 3 and 4 classes from protein primary sequence, predicted secondary structure, and hydrophobicity interaction scales. Our predictor, tested on a non redundant set of 2171 proteins, achieves classification performances of up to 72.6%, 16% above a base-line statistical predictor. We design a system for the prediction of contact maps from the predicted PE. Our results show that predicting maps through the PE yields sizeable gains especially for long-range contacts which are particularly critical for accurate protein 3D reconstruction. The final predictor's accuracy on a non-redundant set of 327 targets is 35.4% and 19.8% for minimum contact separations of 12 and 24, respectively, when the top length/5 contacts are selected. On the 11 CASP6 Novel Fold targets we achieve similar accuracies (36.5% and 19.7%). This favourably compares with the best automated predictors at CASP6. CONCLUSION: Our final system for contact map prediction achieves state-of-the-art performances, and may provide valuable constraints for improved ab initio prediction of protein structures. A suite of predictors of structural features, including the PE, and PE-based contact maps, is available at

    An adaptive model for learning molecular endpoints

    Get PDF
    I will describe a recursive neural network that deals with undirected graphs, and its application to predicting property labels or activity values of small molecules. The model is entirely general, in that it can process any undirected graph with a finite number of nodes by factorising it into a number of directed graphs with the same skeleton. The model\u27s only input in the applications I will present is the graph representing the chemical structure of the molecule. In spite of its simplicity, the model outperforms or matches the state of the art in three of the four tasks, and in the fourth is outperformed only by a method resorting to a very problem-specific feature

    DISULFIND: a disulfide bonding state and cysteine connectivity prediction server

    Get PDF
    DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at

    The Ebola virus disease outbreak in Tonkolili district, Sierra Leone: a retrospective analysis of the Viral Haemorrhagic Fever surveillance system, July 2014–June 2015

    Get PDF
    In Sierra Leone, the Ebola virus disease (EVD) outbreak occurred with substantial differences between districts with someone even not affected. To monitor the epidemic, a community event-based surveillance system was set up, collecting data into the Viral Haemorrhagic Fever (VHF) database. We analysed the VHF database of Tonkolili district to describe the epi- demiology of the EVD outbreak during July 2014–June 2015 (data availability). Multivariable analysis was used to identify risk factors for EVD, fatal EVD and barriers to healthcare access, by comparing EVD-positive vs. EVD-negative cases. Key-performance indicators for EVD response were also measured. Overall, 454 EVD-positive cases were reported. At multivariable analysis, the odds of EVD was higher among those reporting contacts with an EVD-positive/ suspected case (odds ratio (OR) 2.47; 95% confidence interval (CI) 2.44–2.50; P < 0.01) and those attending funeral (OR 1.02; 95% CI 1.01–1.04; P < 0.01). EVD cases from Kunike chief- dom had a lower odds of death (OR 0.22; 95% CI 0.08–0.44; P < 0.01) and were also more likely to be hospitalised (OR 2.34; 95% CI 1.23–4.57; P < 0.05). Only 25.1% of alerts were gen- erated within 1 day from symptom onset. EVD preparedness and response plans for Tonkolili should include social-mobilisation activities targeting Ebola/knowledge-attitudes-practice dur- ing funeral attendance, to avoid contact with suspected cases and to increase awareness on EVD symptoms, in order to reduce delays between symptom onset to alert generation and consequently improve the outbreak-response promptness

    Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines

    Get PDF
    Intrinsically disordered proteins have long stretches of their polypeptide chain, which do not adopt a single native structure composed of stable secondary and tertiary structure in the absence of binding partners. The prediction of intrinsically disordered regions in proteins from sequence is increasingly becoming of interest, as the presence of many such regions in the complete genome sequences are discovered and important functional roles are associated with them. We have developed a machine learning approach based on two support vector machines (SVM) to discriminate disordered regions from sequence. The SVM are trained and benchmarked on two sets, representing long and short disordered regions. A preliminary version of Spritz was shown to perform consistently well at the recent biannual CASP-6 experiment [Critical Assessment of Techniques for Protein Structure Prediction (CASP), 2004]. The fully developed Spritz method is freely available as a web server at and

    Vaccine coverage and determinants of incomplete vaccination in children aged 12-23 months in dschang, west region, cameroon: a cross-sectional survey during a polio outbreak

    Get PDF
    Inadequate immunization coverage with increased risk of vaccine preventable diseases outbreaksremains a problem in Africa. Moreover, different factors contribute to incomplete vaccination status. This study wasperformed in Dschang (West Region, Cameroon), during the polio outbreak occurred in October 2013, in order toestimate the immunization coverage among children aged 12–23 months, to identify determinants for incompletevaccination status and to assess the risk of poliovirus spread in the study population.Methods:A cross-sectional household survey was conducted in November-December 2013, using the WHOtwo-stage sampling design. An interviewer-administered questionnaire was used to obtain information fromconsenting parents of children aged 12–23 months. Vaccination coverage was assessed by vaccination card andparents’recall. Chi-square test and multilevel logistic regression model were used to identify the determinants ofincomplete immunization status. Statistical significance was set atp90 %, and 73.4 % children completedthe recommended vaccinations before 1-year of age. In the final multilevel logistic regression model, factorssignificantly associated with incomplete immunization status were: retention of immunization card (AOR: 7.89;95 % CI: 1.08–57.37), lower mothers’utilization of antenatal care (ANC) services (AOR:1.25; 95 % CI: 1.07–63.75),being the≥3rdborn child in the family (AOR: 425.4; 95 % CI: 9.6–18,808), younger mothers’age (AOR: 49.55;95 % CI: 1.59–1544), parents’negative attitude towards immunization (AOR: 20.2; 95 % CI: 1.46–278.9), and poorerparents’exposure to information on vaccination (AOR: 28.07; 95 % CI: 2.26–348.1). Longer distance from the vaccinationcenters was marginally significant (p=0.05).Conclusion:Vaccination coverage was high; however, 1 out of 7 children was partially vaccinated, and 1 out of 4 didnot complete timely the recommended vaccinations. In order to improve the immunization coverage, it is necessary tostrengthen ANC services, and to improve parents’information and attitude towards immunization, targeting youngerparents and families living far away from vaccination centers, using appropriate communication strategies. Finally, theestimated OPV-3 coverage is reassuring in relation to the ongoing polio outbrea

    Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

    Get PDF
    Background: Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure. Results: We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious. Conclusion: Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the- art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/.Science Foundation IrelandHealth Research BoardUCD President's Award 2004au, ti, sp, ke, ab - kpw16/12/1

    Ab initio and homology based prediction of protein domains by recursive neural networks

    Get PDF
    Background: Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures. Results: We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality. We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly. Conclusion: The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: http://distill.ucd.ie/shandy/ and we plan on running them on a multi-genomic scale and make the results public in the near future.Science Foundation IrelandHealth Research BoardUCD President's Award 2004au, da, sp, ke, ab - kpw2/12/1

    Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins

    Get PDF
    BACKGROUND: We describe Distill, a suite of servers for the prediction of protein structural features: secondary structure; relative solvent accessibility; contact density; backbone structural motifs; residue contact maps at 6, 8 and 12 Angstrom; coarse protein topology. The servers are based on large-scale ensembles of recursive neural networks and trained on large, up-to-date, non-redundant subsets of the Protein Data Bank. Together with structural feature predictions, Distill includes a server for prediction of C(α )traces for short proteins (up to 200 amino acids). RESULTS: The servers are state-of-the-art, with secondary structure predicted correctly for nearly 80% of residues (currently the top performance on EVA), 2-class solvent accessibility nearly 80% correct, and contact maps exceeding 50% precision on the top non-diagonal contacts. A preliminary implementation of the predictor of protein C(α )traces featured among the top 20 Novel Fold predictors at the last CASP6 experiment as group Distill (ID 0348). The majority of the servers, including the C(α )trace predictor, now take into account homology information from the PDB, when available, resulting in greatly improved reliability. CONCLUSION: All predictions are freely available through a simple joint web interface and the results are returned by email. In a single submission the user can send protein sequences for a total of up to 32k residues to all or a selection of the servers. Distill is accessible at the address:

    Hospitalization for pneumonia is associated with decreased 1-year survival in patients with type 2 diabetes results from a prospective cohort study

    Get PDF
    Diabetes mellitus is a frequent comorbid conditions among patients with pneumonia living in the community. The aim of our study is to evaluate the impact of hospitalization for pneumonia on early (30 day) and late mortality (1 year) in patients with type 2 diabetes mellitus. Prospective comparative cohort study of 203 patients with type 2 diabetes hospitalized for pneumonia versus 206 patients with diabetes hospitalized for other noninfectious causes from January 2012 to December 2013 at Policlinico Umberto I (Rome). Enrolled patients were followed up to discharge and up to 1 year after initial hospital admission or death. Overall, 203 patients with type 2 diabetes admitted to hospital for pneumonia were compared to 206 patients with type 2 diabetes admitted for other causes (39.3% decompensated diabetes, 21.4% cerebrovascular diseases, 9.2% renal failure, 8.3% acute myocardial infarction, and 21.8% other causes). Compared to control patients, those admitted for pneumonia showed a higher 30-day (10.8% vs 1%, P<0.001) and 1-year mortality rate (30.3% vs 16.8%, P<0.001). Compared to survivors, nonsurvivor patients with pneumonia had a higher incidence of moderate to severe chronic kidney disease, hemodialysis, and malnutrition were more likely to present with a mental status deterioration, and had a higher number of cardiovascular events during the follow-up period. Cox regression analysis found age, Charlson comorbidity index, pH<7.35 at admission, hemodialysis, and hospitalization for pneumonia as variables independently associated with mortality. Hospitalization for pneumonia is associated with decreased 1-year survival in patients with type 2 diabetes, and appears to be a major determinant of long-term outcome in these patients
    corecore